Statistical Corpus Analysis for Kt{treasure : Korea Telecom Train Ticket Reservation Aid System Based upon Speech Recognition

نویسنده

  • Woosung Kim
چکیده

This paper describes statistical analysis results of the corpus for KT{TREASURE (Korea Telecom Train ticket REservation Aid System based Upon speech REcognition). As the beginning of this development, two sets of speech corpus were collected. One was based on human-human(H-H) dialogues and the other was based on human-computer(H-C) dialogues. Wizard of Oz(WOZ) experiment was carried out to collect speech corpus based on H-C spoken dialogue. Linguistic analysis results show that people respond diierently when they are talking to a computer compared to when talking to a human. Since the basic unit of grammar in Korean is a morpheme, Korean-language model based on a morpheme was designed in addition to a word-based language model. We also deened the subword unit which lies between word and morpheme, then constructed a subword-based language model. Language-model analysis results reveal that a morpheme-based language model gives 50% reduction in perplexity(PP) over a word-based one. It also shows that a morpheme-based language model is least aaected by vocabulary reduction.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Korean speech corpus for train ticket reservation aid system based on speech recognition

This paper describes the Korean speech corpus for train ticket reservation aid system based on speech recognition. Two sets of speech corpus were collected. One was based on human-human(H-H) dialogues and the other was based on human-computer(H-C) dialogues. WOZ(Wizard of Oz) experiment was carried out to collect speech corpus based on H-C spoken dialogue. A total of 298 speaker data was collec...

متن کامل

KT-STS: a speech translation system for hotel reservation and a continuous speech recognition system for speech translation

In this paper, we present KT-STS(Korea Telecom Speech Translation System) and a continuous speech recognition system for speech translation. KT-STS is an experimental speech-to-speech translation system which translates a spoken utterance in Korean into one in Japanese. The system has been designed around the task of hotel reservation (dialogues between a Korean customer and a hotel reservation...

متن کامل

Towards best practice in the development and evaluation of speech recognition components of a spoken language dialog system

Spoken Language Dialog Systems (SLDSs) aim to use natural spoken input for performing an information processing task such as call routing or train ticket reservation (Lamel et al., 1995). The main functionality of an SLDS are speech recognition, natural language understanding, dialog management, response generation and the speech synthesis. This article summarizes key aspects of the current pra...

متن کامل

Using Context-based Statistical Models to Promote the Quality of Voice Conversion Systems

This article aims to examine methods of optimizing GMM-based voice conversion systems performance in which GMM method is introduced as the basic method for improvement of voice conversion systems performance. In the current methods, due to using a single conversion function to convert all speech units and subsequent spectral smoothing arising from statistical averaging, we will observe quality ...

متن کامل

پیکره اعلام: یک پیکره استاندارد واحدهای اسمی برای زبان فارسی

Named entity recognition (NER) is a natural language processing (NLP) problem that is mainly used for text summarization, data mining, data retrieval, question and answering, machine translation, and document classification systems. A NER system is tasked with determining the border of each named entity, recognizing its type and classifying it into predefined categories. The categories of named...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997